An Indexing Scheme for Fast Similarity Search in Large Time Series Databases

نویسندگان

  • Eamonn J. Keogh
  • Michael J. Pazzani
چکیده

We address the problem of similarity search in large time series databases. We introduce a novel indexing algorithm that allows faster retrieval. The index is formed by creating bins that contain time series subsequences of approximately the same shape. For each bin, we can quickly calculate a lower-bound on the distance between a given query and the most similar element of the bin. This bound allows us to search the bins in best first order, and to prune some bins from the search space without having to examine the contents. Additional speedup is obtained by optimizing the data within the bins such that we can avoid having to compare the query to every item in the bin. We call our approach STB-indexing and experimentally validate it on space telemetry, medical and synthetic data, demonstrating approximately an order of magnitude speed-up.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Similarity Search for Time Series Data Based on the Minimum Distance

We address the problem of efficient similarity search based on the minimum distance in large time series databases. Most of previous work is focused on similarity matching and retrieval of time series based on the Euclidean distance. However, as we demonstrate in this paper, the Euclidean distance has limitations as a similarity measurement. It is sensitive to the absolute offsets of time seque...

متن کامل

A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases

We address the problem of similarity search in large time series databases. We introduce a novel-dimensionality reduction technique that supports an indexing algorithm that is more than an order of magnitude faster than the previous best known method. In addition to being much faster our approach has numerous other advantages. It is simple to understand and implement, allows more flexible dista...

متن کامل

Fast Time Sequence Indexing for Arbitrary Lp NormsByoung - Kee

Fast indexing in time sequence databases for similarity searching has attracted a lot of research recently. Most of the proposals, however, typically centered around the Euclidean distance and its derivatives. We examine the problem of multi-modal similarity search in which users can choose the best one from multiple similarity models for their needs. In this paper, we present a novel and fast ...

متن کامل

Fast Time Sequence Indexing for Arbitrary Lp Norms

Fast indexing in time sequence databases for similarity searching has attracted a lot of research recently. Most of the proposals, however, typically centered around the Euclidean distance and its derivatives. We examine the problem of multimodal similarity search in which users can choose the best one from multiple similarity models for their needs. In this paper, we present a novel and fast i...

متن کامل

Dimensionality Reduction for Indexing Time Series Based on the Minimum Distance

We address the problem of efficient similarity search based on the minimum distance in large time series databases. To support minimum distance queries, most of previous work has to take the preprocessing step of vertical shifting. However, the vertical shifting has an additional overhead in building index. In this paper, we propose a novel dimensionality reduction technique for indexing time s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999